Skip to content

CLDSRV-898: handle checksums in CompleteMultipartUpload#6170

Open
leif-scality wants to merge 9 commits into
development/9.4from
improvement/CLDSRV-898-complete-mpu-checksums-2
Open

CLDSRV-898: handle checksums in CompleteMultipartUpload#6170
leif-scality wants to merge 9 commits into
development/9.4from
improvement/CLDSRV-898-complete-mpu-checksums-2

Conversation

@leif-scality
Copy link
Copy Markdown
Contributor

@leif-scality leif-scality commented May 15, 2026

  • Calculate and compare the final object checksum with the one sent by the headers
  • Check that all parts have the correct checksum and checksum type
  • stores the final checksum when FULL_OBJECT (COMPOSITE are going to be stored by https://scality.atlassian.net/browse/S3C-10399)
  • lint tests/unit/api/multipartUpload.js

@claude
Copy link
Copy Markdown

claude Bot commented May 15, 2026

LGTM

Review by Claude Code

@codecov
Copy link
Copy Markdown

codecov Bot commented May 15, 2026

Codecov Report

❌ Patch coverage is 92.35294% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.39%. Comparing base (580d648) to head (7b3785c).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
lib/api/completeMultipartUpload.js 90.59% 11 Missing ⚠️
lib/api/apiUtils/integrity/validateChecksums.js 96.22% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

Files with missing lines Coverage Δ
lib/api/apiUtils/integrity/validateChecksums.js 98.80% <96.22%> (-0.25%) ⬇️
lib/api/completeMultipartUpload.js 90.34% <90.59%> (+0.99%) ⬆️

... and 3 files with indirect coverage changes

@@                 Coverage Diff                 @@
##           development/9.4    #6170      +/-   ##
===================================================
+ Coverage            85.25%   85.39%   +0.13%     
===================================================
  Files                  208      208              
  Lines                13919    14065     +146     
===================================================
+ Hits                 11867    12011     +144     
- Misses                2052     2054       +2     
Flag Coverage Δ
file-ft-tests 69.30% <62.94%> (+0.48%) ⬆️
kmip-ft-tests 27.88% <4.70%> (-0.25%) ⬇️
mongo-v0-ft-tests 70.50% <62.94%> (+0.52%) ⬆️
mongo-v1-ft-tests 70.46% <62.94%> (+0.39%) ⬆️
multiple-backend 36.28% <5.88%> (-0.33%) ⬇️
sur-tests 35.14% <5.88%> (-1.23%) ⬇️
sur-tests-inflights 38.00% <46.47%> (+0.65%) ⬆️
unit 72.23% <91.17%> (+0.21%) ⬆️
utapi-v2-tests 35.22% <46.47%> (+0.44%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@leif-scality leif-scality force-pushed the improvement/CLDSRV-902-calculate-final-checksum-from-parts branch 2 times, most recently from 2493909 to 538c16c Compare May 21, 2026 09:18
Base automatically changed from improvement/CLDSRV-902-calculate-final-checksum-from-parts to development/9.4 May 21, 2026 13:18
@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented May 21, 2026

Hello leif-scality,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options
name description privileged authored
/after_pull_request Wait for the given pull request id to be merged before continuing with the current one.
/bypass_author_approval Bypass the pull request author's approval
/bypass_build_status Bypass the build and test status
/bypass_commit_size Bypass the check on the size of the changeset TBA
/bypass_incompatible_branch Bypass the check on the source branch prefix
/bypass_jira_check Bypass the Jira issue check
/bypass_peer_approval Bypass the pull request peers' approval
/bypass_leader_approval Bypass the pull request leaders' approval
/approve Instruct Bert-E that the author has approved the pull request. ✍️
/create_pull_requests Allow the creation of integration pull requests.
/create_integration_branches Allow the creation of integration branches.
/no_octopus Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
/unanimity Change review acceptance criteria from one reviewer at least to all reviewers
/wait Instruct Bert-E not to run until further notice.
Available commands
name description privileged
/help Print Bert-E's manual in the pull request.
/status Print Bert-E's current status in the pull request TBA
/clear Remove all comments from Bert-E from the history TBA
/retry Re-start a fresh build TBA
/build Re-start a fresh build TBA
/force_reset Delete integration branches & pull requests, and restart merge process from the beginning.
/reset Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented May 21, 2026

Incorrect fix version

The Fix Version/s in issue CLDSRV-898 contains:

  • None

Considering where you are trying to merge, I ignored possible hotfix versions and I expected to find:

  • 9.4.0

Please check the Fix Version/s of CLDSRV-898, or the target
branch of this pull request.

@leif-scality leif-scality force-pushed the improvement/CLDSRV-898-complete-mpu-checksums-2 branch from 27b4a43 to a90bd94 Compare May 21, 2026 13:24
@claude
Copy link
Copy Markdown

claude Bot commented May 21, 2026

LGTM

Well-designed separation of CompleteMPU's final-object checksum semantics from body-digest validation. The callNext double-callback guard in processParts is a good defensive pattern for the mixed callback/promise flow. Test coverage is thorough across all algorithm/type combinations, including edge cases for default MPUs, legacy MPUs, and soft-null vs InternalError distinction.

Review by Claude Code

Comment thread lib/api/completeMultipartUpload.js Outdated
Comment on lines +73 to +75
const parts = jsonList.Part || [];
for (let i = 0; i < parts.length; i++) {
const part = parts[i];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const parts = jsonList.Part || [];
for (let i = 0; i < parts.length; i++) {
const part = parts[i];
for (const part of (jsonList.Part || [])) {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread lib/api/completeMultipartUpload.js Outdated
if (tag !== expectedTag) {
const algoLabel = tag.replace(/^Checksum/, '').toLowerCase();
return errorInstances.BadDigest.customizeDescription(
`The ${algoLabel} you specified for part ${partNumber} ` + 'did not match what we received.',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`The ${algoLabel} you specified for part ${partNumber} ` + 'did not match what we received.',
`The ${algoLabel} you specified for part ${partNumber} did not match what we received.`,

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread lib/api/completeMultipartUpload.js
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's already a test file multipartUpload.js, wouldn't it be better to add the tests there? Or if a refactor into multiple files is beneficial, maybe port some related tests from there to this new file to avoid scattering the complete MPU tests into multiple files.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved the tests to multipartUpload.js. This also triggered the new prettier linter in the file, I added it as a separate commit

Comment thread tests/unit/api/completeMultipartUpload.js Outdated

// XML element name AWS uses for each algorithm in CompleteMultipartUpload's
// per-part body.
const TAG_BY_ALGO = {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be in constants too?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The algorithms object contains the TAG of each algo already, this object is just for testing that the tag was not changed in the algorithms object

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you construct it from algorithms? If algorithm is supposed to be a constant and we are concerned that it may be accidentally changed, what about using Object.freeze on it?

assert.strictEqual(
err.description,
'One or more of the specified parts could not be ' +
'found. The part may not have been uploaded, or ' +
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the double space intentional? Also not very fond of matching against such a long error message that may vary, but that's okay I think.

Copy link
Copy Markdown
Contributor Author

@leif-scality leif-scality May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AWS returns two spaces. For all the final API/XML errors I return exactly what AWS returns

 <Error><Code>InvalidPart</Code>
<Message>One or more of the specified parts could not be found.  The part may not have been uploaded, or the specified entity
  tag may not match the part's entity tag.</Message>
<UploadId>CTFNKyL2hI6n6irH0zbVWDzdPZ4n2ueJceRh1juCeuL2X5HOjrvCmXQMEqaoAatEWTEa3pWWxC7t9lOStMzjo0nJb4pv8Ct6oT
  Hv2n8mggVXRQ8RxiXyVyt3.3zpY98HVsZd.ozihhJ1HdUjLCkJtwJMQdNBd4fSdG9drS80vdg-</UploadId>
<PartNumber>1</PartNumber>
<ETag>0ebf9257a12e808d107b2ed1a826c122</ETag>
<RequestId>DAQAPVMCMSY4PDPV</RequestId>
<HostId>XS/2re3ieUQRTKdANLtZv14qyB2h3LjVHnmVrvjP0cj1PazPO16KkArQMtBLBy8S4mmLzQkuXZc=</HostId></Error>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay fair enough (my little voice tells me that AWS didn't care that much about the beauty of their error messages 🙂 )

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe no sane application would match an error by their exact error message to the dot, but I'm good with sticking to AWS to avoid ourselves asking the question. Personally I wouldn't care though 😛

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I believe the error messages are not enforced by the API contract, hence may change at any time on AWS side, which would make our effort useless in the end. Just my two cents and my last comment on this 😉

Comment on lines +457 to +462
// `x-amz-checksum-type` and `x-amz-checksum-algorithm` are configuration
// headers (MPU completeness mode / SDK algorithm hint), not value
// headers. They must not count toward the "value header" tally.
const valueHeaders = Object.keys(headers).filter(
h => h.startsWith('x-amz-checksum-') && h !== 'x-amz-checksum-type' && h !== 'x-amz-checksum-algorithm',
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the list of supported headers is known and a short list of supported algorithms, it may be cleaner to directly extract each of the possible ones, or otherwise filter like [list-of-supported-headers].includes(h).

It would change the behavior (probably in a good way) if the client sends one or more unsupported checksums along with multiple valid ones, where we probably want to return AlgoNotSupported rather than MultipleChecksumTypes in priority in this case, but that's more a nitpick, either way should work.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AWS checks MultipleChecksumTypes before AlgoNotSupported, the current behavior is the same as AWS.

AWS also ignores only x-amz-checksum-type and x-amz-checksum-algorithm, they don't count them to the checksum count, and they also never trigger AlgoNotSupported. if we send x-amz-checksum-BAD on the other hand we get an AlgoNotSupported. If we send x-amz-checksum-BAD + x-amz-checksum-ZZZ we get MultipleChecksumTypes.

So the order is

  1. check no MultipleChecksumTypes
  2. check no AlgoNotSupported
  3. check actual checksum value, BadDigest if mismatch

I added a commit to ignore x-amz-checksum-type and x-amz-checksum-algorithm in the other functions, I didn't know this behavior existed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me.

IMHO we don't necessarily have to mimic every AWS behavior when it comes to error handling in particular, as it takes time to reverse engineer those aspects with no clear benefit, and AWS implementation is not perfect and their choices can be debated too. Sure it became the de facto standard and we need to stick to its way of doing things to avoid breaking existing applications, but not necessarily in all its nitty gritty details and quirks.

Comment thread lib/api/completeMultipartUpload.js Outdated
Comment thread lib/api/completeMultipartUpload.js Outdated
@leif-scality leif-scality force-pushed the improvement/CLDSRV-898-complete-mpu-checksums-2 branch from a90bd94 to e468700 Compare May 22, 2026 13:01
@claude
Copy link
Copy Markdown

claude Bot commented May 22, 2026

LGTM

The checksum handling for CompleteMultipartUpload is well-implemented: the split between md5-only body validation and final-object checksum assertion correctly matches AWS behavior. The callNext guard in processParts properly prevents double-callbacks in the mixed callback/promise flow. Per-part validation, final-object computation, and header assertion are thorough. Tests cover the full (algorithm, type) matrix including edge cases (default MPU, legacy MPU, missing parts).

Review by Claude Code

@leif-scality leif-scality force-pushed the improvement/CLDSRV-898-complete-mpu-checksums-2 branch from 46b6d78 to 6e735e4 Compare May 22, 2026 13:32
Comment thread lib/api/completeMultipartUpload.js Outdated
@claude
Copy link
Copy Markdown

claude Bot commented May 22, 2026

  • completeMultipartUpload.js:352 — When storedMetadata.checksumType is undefined (legacy MPU created before the checksum feature), the error message interpolates undefined into user-facing text: "The upload was created using the undefined checksum mode." Split the !mpuType guard into a separate branch with an appropriate message.

    Review by Claude Code

Comment thread lib/api/apiUtils/integrity/validateChecksums.js
@claude
Copy link
Copy Markdown

claude Bot commented May 22, 2026

LGTM — solid implementation with thorough test coverage.

One minor nit:
- Orphaned JSDoc in validateChecksums.js:332-337: the validateChecksumsNoChunking JSDoc now sits above the new validateContentMd5 function instead of its intended target.

Review by Claude Code

@leif-scality leif-scality force-pushed the improvement/CLDSRV-898-complete-mpu-checksums-2 branch from 6e735e4 to 1c92b9b Compare May 22, 2026 14:38
Comment thread tests/unit/api/multipartUpload.js Outdated
@claude
Copy link
Copy Markdown

claude Bot commented May 22, 2026

  • require('../../../lib/services') inside an it() callback at tests/unit/api/multipartUpload.js:4119. The project testing conventions require require()/import at the top of the file, never inside describe or function bodies. Move it alongside the other top-level imports.
    - Add const services = require('../../../lib/services'); at the top of the file and reference it from the test.

    Review by Claude Code

@leif-scality leif-scality force-pushed the improvement/CLDSRV-898-complete-mpu-checksums-2 branch from 1c92b9b to 3df48c2 Compare May 22, 2026 14:44
@claude
Copy link
Copy Markdown

claude Bot commented May 22, 2026

LGTM

The checksum validation logic is well-structured: moving CompleteMPU out of checksumedMethods into md5OnlyMethods correctly distinguishes body-digest headers from final-object-checksum headers. The callNext double-callback guard in the Promise-to-callback bridge is appropriate defensive coding. Test coverage is thorough across all algorithm/type combinations, edge cases (default vs explicit MPU, legacy MPU, missing parts), and the new x-amz-checksum-type header validation.

Review by Claude Code

@leif-scality leif-scality force-pushed the improvement/CLDSRV-898-complete-mpu-checksums-2 branch from 3df48c2 to c190267 Compare May 22, 2026 14:55
Comment thread lib/api/completeMultipartUpload.js Outdated
uploadId,
error,
});
return callNext(error, destBucket);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

callNext is not defined anywhere in this file — this will throw a ReferenceError at runtime when computeFinalChecksum returns an error for an explicit (non-default) MPU. Should be next.

— Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@claude
Copy link
Copy Markdown

claude Bot commented May 22, 2026

  • lib/api/completeMultipartUpload.js:623: callNext is not defined — will throw ReferenceError at runtime when computeFinalChecksum returns an error on an explicit (non-default) MPU. Should be next.
    - Replace callNext(error, destBucket) with next(error, destBucket)

    Review by Claude Code

@leif-scality leif-scality force-pushed the improvement/CLDSRV-898-complete-mpu-checksums-2 branch from c190267 to 7b3785c Compare May 22, 2026 15:01
@claude
Copy link
Copy Markdown

claude Bot commented May 22, 2026

LGTM

Review by Claude Code

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This diff is too big, it could be in another PR by itself

Comment on lines +175 to +180
const checksumHeaders = Object.keys(headers).filter(
header =>
header.startsWith('x-amz-checksum-') &&
header !== 'x-amz-checksum-type' &&
header !== 'x-amz-checksum-algorithm',
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern reoccurs multiple times in this file, you can make a function for that

});

it(
'should return CRC64NVME/FULL_OBJECT on CompleteMPU response ' + 'when CreateMPU sent no checksum headers',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'should return CRC64NVME/FULL_OBJECT on CompleteMPU response ' + 'when CreateMPU sent no checksum headers',
'should return CRC64NVME/FULL_OBJECT on CompleteMPU response when CreateMPU sent no checksum headers',

@leif-scality leif-scality force-pushed the improvement/CLDSRV-898-complete-mpu-checksums-2 branch from 7b3785c to a90bd94 Compare May 22, 2026 15:52
Comment on lines +211 to +222
let caught;
try {
await promise;
} catch (err) {
caught = err;
}
assert(caught, 'expected CompleteMPU to reject');
assert.strictEqual(
caught.name,
'InvalidPart',
`expected InvalidPart, got ${caught.name}: ${caught.message}`,
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use assert.rejects

            await assert.rejects(promise, err => {
                assert.strictEqual(
                    err.name,
                    'InvalidPart',
                    `expected InvalidPart, got ${err.name}: ${err.message}`,
                );
               });

@leif-scality leif-scality force-pushed the improvement/CLDSRV-898-complete-mpu-checksums-2 branch from a90bd94 to 7b3785c Compare May 22, 2026 15:56
@scality scality deleted a comment from claude Bot May 22, 2026
@scality scality deleted a comment from claude Bot May 22, 2026
@leif-scality
Copy link
Copy Markdown
Contributor Author

I forced push from the wrong branch, reverted the changes

const log = { debug: sandbox.stub() };
const result = await validateMethodChecksumNoChunking(request, body, log);
assert.deepStrictEqual(result, ArsenalErrors.BadDigest);
assert(log.debug.calledOnce);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debug logs should probably not be tested

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove the log tests

Comment thread lib/api/completeMultipartUpload.js
Comment thread lib/api/completeMultipartUpload.js
@claude
Copy link
Copy Markdown

claude Bot commented May 22, 2026

LGTM — well-structured feature with thorough test coverage across all algorithm/type combinations, error paths, and edge cases. Two minor nits posted inline:

- continueProcessParts(null) at two call sites passes an argument the function ignores — should be continueProcessParts()

Review by Claude Code

};
const log = { debug: sandbox.stub() };
const result = await validateMethodChecksumNoChunking(request, body, log);
assert.strictEqual(result, null);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some tests above you use assert.ifError(result) and here assert.strictEqual(result, null), you should stick to only 1 pattern and be consistent everywhere

apiMethod: 'completeMultipartUpload',
headers: { 'content-md5': '1B2M2Y8AsgTpgAmY7PhCfg==' },
};
const log = { debug: sandbox.stub() };
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to have a full mocked logger like it's done in some other test files, to avoid breakin if we change from debug to another level

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove the log tests

Comment on lines +133 to +134
const algorithm = storedMetadata.checksumAlgorithm;
const type = storedMetadata.checksumType;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deconstruction could be slightly more readable

Suggested change
const algorithm = storedMetadata.checksumAlgorithm;
const type = storedMetadata.checksumType;
const { checksumAlgorithm: algorithm, checksumType: type } = storedMetadata;

}
const { keysToDelete, extraPartLocations } = filteredPartsObj;
const { aggregateETag, dataLocations, calculatedSize } = partsInfo;
function continueProcessParts() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this function be another step in the waterfall?

error: resolveErr,
});
return next(resolveErr, destBucket);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a benefit in catching those errors since we already have a catch-all below

// CompleteMPU is not in `checksumedMethods` (x-amz-checksum-* is the
// final-object checksum, not a body digest) but it still must validate
// Content-MD5 against the XML body when present.
describe('completeMultipartUpload (md5-only path)', () => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a lot of the unit tests could be turned into a declarative matrix for readability.


// Resolves with { xml, headers } so callers can inspect both the
// response body and the response headers.
function completeMpuP(uploadId, partChecksumXml = '') {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me the test code is riddled with this type of local helpers, maybe we should consider a common layer of helpers for MPU construction?

Comment on lines +137 to +138
it(
'should return CRC64NVME/FULL_OBJECT on CompleteMPU response ' + 'when CreateMPU sent no checksum headers',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra newline and concatenation doesn't look necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants